\subsubsection{ARM: 8 arguments}

Let's use again the example with 9 arguments from the previous section: \myref{example_printf8_x64}.

\lstinputlisting[style=customc]{patterns/03_printf/2.c}

\myparagraph{\OptimizingKeilVI: \ARMMode}

\begin{lstlisting}[style=customasmARM]
.text:00000028             main
.text:00000028
.text:00000028             var_18 = -0x18
.text:00000028             var_14 = -0x14
.text:00000028             var_4  = -4
.text:00000028
.text:00000028 04 E0 2D E5  STR    LR, [SP,#var_4]!
.text:0000002C 14 D0 4D E2  SUB    SP, SP, #0x14
.text:00000030 08 30 A0 E3  MOV    R3, #8
.text:00000034 07 20 A0 E3  MOV    R2, #7
.text:00000038 06 10 A0 E3  MOV    R1, #6
.text:0000003C 05 00 A0 E3  MOV    R0, #5
.text:00000040 04 C0 8D E2  ADD    R12, SP, #0x18+var_14
.text:00000044 0F 00 8C E8  STMIA  R12, {R0-R3}
.text:00000048 04 00 A0 E3  MOV    R0, #4
.text:0000004C 00 00 8D E5  STR    R0, [SP,#0x18+var_18]
.text:00000050 03 30 A0 E3  MOV    R3, #3
.text:00000054 02 20 A0 E3  MOV    R2, #2
.text:00000058 01 10 A0 E3  MOV    R1, #1
.text:0000005C 6E 0F 8F E2  ADR    R0, aADBDCDDDEDFDGD ; "a=%d; b=%d; c=%d; d=%d; e=%d; f=%d; g=%"...
.text:00000060 BC 18 00 EB  BL     __2printf
.text:00000064 14 D0 8D E2  ADD    SP, SP, #0x14
.text:00000068 04 F0 9D E4  LDR    PC, [SP+4+var_4],#4
\end{lstlisting}

This code can be divided into several parts:

\myindex{Function prologue}
\begin{itemize}
\item Function prologue:

\myindex{ARM!\Instructions!STR}
The very first \INS{STR LR, [SP,\#var\_4]!} instruction saves \ac{LR} on the stack, because we are going to use this register for the \printf call.
Exclamation mark at the end indicates \IT{pre-index}.

This implies that \ac{SP} is to be decreased by 4 first, and then \ac{LR} will be saved at the address stored in \ac{SP}.
This is similar to \PUSH in x86.
Read more about it at: \myref{ARM_postindex_vs_preindex}.

\myindex{ARM!\Instructions!SUB}
The second \INS{SUB SP, SP, \#0x14} instruction decreases \ac{SP} (the \gls{stack pointer}) in order to allocate \GTT{0x14} (20) bytes on the stack.
Indeed, we have to pass 5 32-bit values via the stack to the \printf function, and each one occupies 4 bytes, which is exactly $5*4=20$.
The other 4 32-bit values are to be passed through registers.

\item Passing 5, 6, 7 and 8 via the stack: they are stored in the \Reg{0}, \Reg{1}, \Reg{2} and \Reg{3} registers respectively.\\
Then, the \INS{ADD R12, SP, \#0x18+var\_14} instruction writes the stack address where these 4 variables are to be stored, into the \Reg{12} register.
\myindex{IDA!var\_?}
\IT{var\_14} is an assembly macro, equal to -0x14, created by \IDA to conveniently display the code accessing the stack.
The \IT{var\_?} macros generated by \IDA reflect local variables in the stack.

So, \GTT{SP+4} is to be stored into the \Reg{12} register. \\
\myindex{ARM!\Instructions!STMIA}
The next \INS{STMIA R12, {R0-R3}} instruction writes registers \Reg{0}-\Reg{3} contents to the memory pointed by \Reg{12}.
\INS{STMIA} abbreviates \IT{Store Multiple Increment After}. 
\IT{\q{Increment After}} implies that \Reg{12} is to be increased by 4 after each register value is written.

\item Passing 4 via the stack: 4 is stored in \Reg{0} and then this value, with the help of the \\
\INS{STR R0, [SP,\#0x18+var\_18]} instruction is saved on the stack.
\IT{var\_18} is -0x18, so the offset is to be 0, thus the value from the \Reg{0} register (4) is to be written to the address written in \ac{SP}.

\item Passing 1, 2 and 3 via registers:
The values of the first 3 numbers (a, b, c) (1, 2, 3 respectively) are passed through the 
\Reg{1}, \Reg{2} and \Reg{3}
registers right before the \printf call, and the other
5 values are passed via the stack:

\item \printf call.

\myindex{Function epilogue}
\item Function epilogue:

The \INS{ADD SP, SP, \#0x14} instruction restores the \ac{SP} pointer back to its former value,
thus annulling everything what has been stored on the stack.
Of course, what has been stored on the stack will stay there, but it will all be rewritten during the execution of subsequent functions.

\myindex{ARM!\Instructions!LDR}
The \INS{LDR PC, [SP+4+var\_4],\#4} instruction loads the saved \ac{LR} value from the stack into the \ac{PC} register, thus causing the function to exit.
There is no exclamation mark---indeed, \ac{PC} is loaded first from the address stored in \ac{SP} ($4+var\_4=4+(-4)=0$, so this instruction is analogous to \INS{LDR PC, [SP],\#4}), and then \ac{SP} is increased by 4.
This is referred as \IT{post-index}\footnote{Read more about it: \myref{ARM_postindex_vs_preindex}.}.
Why does \IDA display the instruction like that?
Because it wants to illustrate the stack layout and the fact that \GTT{var\_4} is allocated for saving the \ac{LR} value in the local stack.
This instruction is somewhat similar to \INS{POP PC} in x86\footnote{It is impossible to set \GTT{IP/EIP/RIP} value using \POP in x86, but anyway, you got the analogy right.}.

\end{itemize}

\myparagraph{\OptimizingKeilVI: \ThumbMode}

\begin{lstlisting}[style=customasmARM]
.text:0000001C             printf_main2
.text:0000001C
.text:0000001C             var_18 = -0x18
.text:0000001C             var_14 = -0x14
.text:0000001C             var_8  = -8
.text:0000001C
.text:0000001C 00 B5        PUSH    {LR}
.text:0000001E 08 23        MOVS    R3, #8
.text:00000020 85 B0        SUB     SP, SP, #0x14
.text:00000022 04 93        STR     R3, [SP,#0x18+var_8]
.text:00000024 07 22        MOVS    R2, #7
.text:00000026 06 21        MOVS    R1, #6
.text:00000028 05 20        MOVS    R0, #5
.text:0000002A 01 AB        ADD     R3, SP, #0x18+var_14
.text:0000002C 07 C3        STMIA   R3!, {R0-R2}
.text:0000002E 04 20        MOVS    R0, #4
.text:00000030 00 90        STR     R0, [SP,#0x18+var_18]
.text:00000032 03 23        MOVS    R3, #3
.text:00000034 02 22        MOVS    R2, #2
.text:00000036 01 21        MOVS    R1, #1
.text:00000038 A0 A0        ADR     R0, aADBDCDDDEDFDGD ; "a=%d; b=%d; c=%d; d=%d; e=%d; f=%d; g=%"...
.text:0000003A 06 F0 D9 F8  BL      __2printf
.text:0000003E
.text:0000003E             loc_3E   ; CODE XREF: example13_f+16
.text:0000003E 05 B0        ADD     SP, SP, #0x14
.text:00000040 00 BD        POP     {PC}
\end{lstlisting}

The output is almost like in the previous example. However, this is Thumb code and the values are packed into stack differently: 
8 goes first, then 5, 6, 7, and 4 goes third.

\myparagraph{\OptimizingXcodeIV: \ARMMode}

\begin{lstlisting}[style=customasmARM]
__text:0000290C             _printf_main2
__text:0000290C
__text:0000290C             var_1C = -0x1C
__text:0000290C             var_C  = -0xC
__text:0000290C
__text:0000290C 80 40 2D E9   STMFD  SP!, {R7,LR}
__text:00002910 0D 70 A0 E1   MOV    R7, SP
__text:00002914 14 D0 4D E2   SUB    SP, SP, #0x14
__text:00002918 70 05 01 E3   MOV    R0, #0x1570
__text:0000291C 07 C0 A0 E3   MOV    R12, #7
__text:00002920 00 00 40 E3   MOVT   R0, #0
__text:00002924 04 20 A0 E3   MOV    R2, #4
__text:00002928 00 00 8F E0   ADD    R0, PC, R0
__text:0000292C 06 30 A0 E3   MOV    R3, #6
__text:00002930 05 10 A0 E3   MOV    R1, #5
__text:00002934 00 20 8D E5   STR    R2, [SP,#0x1C+var_1C]
__text:00002938 0A 10 8D E9   STMFA  SP, {R1,R3,R12}
__text:0000293C 08 90 A0 E3   MOV    R9, #8
__text:00002940 01 10 A0 E3   MOV    R1, #1
__text:00002944 02 20 A0 E3   MOV    R2, #2
__text:00002948 03 30 A0 E3   MOV    R3, #3
__text:0000294C 10 90 8D E5   STR    R9, [SP,#0x1C+var_C]
__text:00002950 A4 05 00 EB   BL     _printf
__text:00002954 07 D0 A0 E1   MOV    SP, R7
__text:00002958 80 80 BD E8   LDMFD  SP!, {R7,PC}
\end{lstlisting}

\myindex{ARM!\Instructions!STMFA}
\myindex{ARM!\Instructions!STMIB}
Almost the same as what we have already seen, with the
exception of \INS{STMFA} (Store Multiple Full Ascending) instruction, which is a synonym of \INS{STMIB} (Store Multiple Increment Before) instruction. 
This instruction increases the value in the \ac{SP} register and only then writes the next register value into the memory, rather than performing those two actions in the opposite order.

Another thing that catches the eye is that the instructions are arranged seemingly random.
For example, the value in the \Reg{0} register is manipulated in three
places, at addresses \GTT{0x2918}, \GTT{0x2920} and \GTT{0x2928}, when it would be possible to do it in one point.

However, the optimizing compiler may have its own reasons on how to order the instructions so to achieve higher efficiency during the execution.

Usually, the processor attempts to simultaneously execute instructions located side-by-side.\\
For example, instructions like \INS{MOVT R0, \#0} and
\INS{ADD R0, PC, R0} cannot be executed simultaneously since they both modify the \Reg{0} register. 
On the other hand, \INS{MOVT R0, \#0} and \INS{MOV R2, \#4} 
instructions can be executed
simultaneously since the effects of their execution are not conflicting with each other.
Presumably, the compiler tries to generate code in such a manner (wherever it is possible).
 
\myparagraph{\OptimizingXcodeIV: \ThumbTwoMode}

\begin{lstlisting}[style=customasmARM]
__text:00002BA0               _printf_main2
__text:00002BA0
__text:00002BA0               var_1C = -0x1C
__text:00002BA0               var_18 = -0x18
__text:00002BA0               var_C  = -0xC
__text:00002BA0
__text:00002BA0 80 B5          PUSH     {R7,LR}
__text:00002BA2 6F 46          MOV      R7, SP
__text:00002BA4 85 B0          SUB      SP, SP, #0x14
__text:00002BA6 41 F2 D8 20    MOVW     R0, #0x12D8
__text:00002BAA 4F F0 07 0C    MOV.W    R12, #7
__text:00002BAE C0 F2 00 00    MOVT.W   R0, #0
__text:00002BB2 04 22          MOVS     R2, #4
__text:00002BB4 78 44          ADD      R0, PC  ; char *
__text:00002BB6 06 23          MOVS     R3, #6
__text:00002BB8 05 21          MOVS     R1, #5
__text:00002BBA 0D F1 04 0E    ADD.W    LR, SP, #0x1C+var_18
__text:00002BBE 00 92          STR      R2, [SP,#0x1C+var_1C]
__text:00002BC0 4F F0 08 09    MOV.W    R9, #8
__text:00002BC4 8E E8 0A 10    STMIA.W  LR, {R1,R3,R12}
__text:00002BC8 01 21          MOVS     R1, #1
__text:00002BCA 02 22          MOVS     R2, #2
__text:00002BCC 03 23          MOVS     R3, #3
__text:00002BCE CD F8 10 90    STR.W    R9, [SP,#0x1C+var_C]
__text:00002BD2 01 F0 0A EA    BLX      _printf
__text:00002BD6 05 B0          ADD      SP, SP, #0x14
__text:00002BD8 80 BD          POP      {R7,PC}
\end{lstlisting}

The output is almost the same as in the previous example, with the exception that Thumb-instructions are used instead.
% FIXME: also STMIA is used instead of STMIB,
% which is why it uses LR, which is 4 bytes ahead of SP

\myparagraph{ARM64}

\mysubparagraph{\NonOptimizing GCC (Linaro) 4.9}

\lstinputlisting[caption=\NonOptimizing GCC (Linaro) 4.9,style=customasmARM]{patterns/03_printf/ARM/ARM8_O0_EN.lst}

The first 8 arguments are passed in X- or W-registers: \ARMPCS.
A string pointer requires a 64-bit register, so it's passed in \RegX{0}.
All other values have a \Tint 32-bit type, so they are stored in the 32-bit part of the registers (W-).
The 9th argument (8) is passed via the stack.
Indeed: it's not possible to pass large number of arguments through registers, because the number of registers is limited.

\Optimizing GCC (Linaro) 4.9 generates the same code.
